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Abstract 

The  field  of  Prognostic  Health  Management  (PHM)  has  been 
undergoing  rapid  growth  in  recent  years,  with  development 
of  increasingly  sophisticated  techniques  for  diagnosing  faults 
in  system  components  and  estimating  fault  progression  tra¬ 
jectories.  Research  efforts  on  how  to  utilize  prognostic  health 
information  (e.g.  for  extending  the  remaining  useful  life  of 
the  system,  increasing  safety,  or  maximizing  operational  ef¬ 
fectiveness)  are  mostly  in  their  early  stages,  however.  The 
process  of  using  prognostic  information  to  determine  a  sys¬ 
tem’s  actions  or  its  configuration  is  beginning  to  be  referred 
to  as  Prognostic  Decision  Making  (PDM).  In  this  paper  we 
propose  a  formulation  of  the  PDM  problem  with  the  attributes 
of  the  aerospace  domain  in  mind,  outline  some  of  the  key  re¬ 
quirements  for  PDM  methods,  and  explore  techniques  that 
can  be  used  as  a  foundation  of  PDM  development.  The  prob¬ 
lem  of  satisfying  the  performance  goals  set  for  specific  objec¬ 
tive  functions  is  discussed  next,  followed  by  ideas  for  possible 
solutions.  The  ideas,  termed  Dynamic  Constraint  Redesign 
(DCR),  have  roots  in  the  fields  of  Multidisciplinary  Design 
Optimization  and  Game  Theory.  Prototype  PDM  and  DCR 
algorithms  are  also  described  and  results  of  their  testing  are 
presented. 

1.  Introduction 

As  aerospace  vehicles  become  more  complex  and  their  mis¬ 
sions  more  demanding,  it  is  becoming  increasingly  challeng¬ 
ing  for  even  the  most  experienced  pilots,  controllers,  and 
maintenance  personnel  to  analyze  changes  in  vehicle  behav¬ 
ior  that  can  indicate  a  fault  and  accurately  predict  the  short- 
and  long-term  effects  that  the  fault  can  produce.  For  this 
reason,  some  of  the  latest  vehicle  designs  begin  to  incor¬ 
porate  automated  fault  diagnostic  and  prognostic  methods 
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that  can  assist  with  these  tasks  (Janasak  &  Beshears,  2007; 
Benedettini,  Baines,  Lightfoot,  &  Greenough,  2009;  Reve- 
ley,  Leone,  Briggs,  &  Withrow,  2010;  Delgado,  Dempsey,  & 
Simon,  2012).  The  research  into  how  to  utilize  prognostics- 
enabled  health  information  in  making  autonomous  or  semi- 
autonomous  decisions  on  system  reconfiguration  or  mission 
replanning  is  still  in  its  early  stages,  however. 

There  are  other  fields  (e.g.,  operations  research,  medicine,  fi¬ 
nancial  analysis,  and  climatology)  where  computer-assisted 
Prognostic  Decision  Making  (PDM)  can  play  or  already  plays 
a  role  -  even  if  the  terminology  used  for  it  is  different  (see,  for 
instance,  (Raisanen  &  Palmer,  2001),  (Wang  &  Zhu,  2008), 
or  (Kasmiran,  Zomaya,  Mazari,  &  Garsia,  2010)).  While  the 
fundamentals  of  PDM  methods  for  these  fields  are  likely  to  be 
similar,  we  believe  that  there  are  important  reasons  to  exam¬ 
ine  how  such  methods  should  be  developed  and  used  specifi¬ 
cally  in  the  context  of  aerospace. 

First,  we  believe  that  PDM  development  needs  to  be  informed 
by  the  unique  set  of  aerospace  domain  characteristics,  where 
the  operating  environment  is  often  harsh  and  dynamic,  sys¬ 
tems  are  highly  complex,  and  an  incorrect  decision  can  lead 
to  loss  of  life.  Conversely,  it  would  be  beneficial  to  inform  ve¬ 
hicle  design  by  the  needs  and  capabilities  of  PDM  algorithms. 
This  includes  computing  requirements,  sensor  suite  selection, 
component  redundancy  considerations,  operating  procedures, 
and  communication  architectures.  A  capable  (and  appropri¬ 
ately  verified  and  validated)  PDM  system  can  expand  both  de¬ 
sign  and  operating  options  for  an  aerospace  vehicle  in  much 
the  same  way  as  a  new  composite  material  can  do  for  its  struc¬ 
ture  or  a  new  type  of  fuel  can  do  for  its  propulsion  system. 

We  foresee  a  number  of  use  cases  for  PDM  in  aerospace  ap¬ 
plications,  with  some  possibilities  listed  below: 

•  Maintenance  and  supply  chain  management 

•  Safety  assurance  for  manned  aircraft  and  spacecraft 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

SEP  2012 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

An  Approach  to  Prognostic  Decision  Making  in  the  Aerospace  Domain 


6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

NASA  Ames  Research  Center, Moffett  Field, CA, 94035 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-2012  to  00-00-2012 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADA581775.  Annual  Conference  of  the  Prognostics  and  Health  Management  Society  (PHM  2012) 
held  in  Minneapolis,  Minnesota  on  September  23-27,  2012.  Sponsored  by  the  Office  of  Naval  Research 
(ONR). 

14.  ABSTRACT 

The  field  of  Prognostic  Health  Management  (PHM)  has  been  undergoing  rapid  growth  in  recent  years, 
with  development  of  increasingly  sophisticated  techniques  for  diagnosing  faults  in  system  components  and 
estimating  fault  progression  trajectories.  Research  efforts  on  how  to  utilize  prognostic  health  information 
(e.g.  for  extending  the  remaining  useful  life  of  the  system,  increasing  safety,  or  maximizing  operational 
effectiveness)  are  mostly  in  their  early  stages,  however.  The  process  of  using  prognostic  information  to 
determine  a  system?s  actions  or  its  configuration  is  beginning  to  be  referred  to  as  Prognostic  Decision 
Making  (PDM).  In  this  paper  we  propose  a  formulation  of  the  PDM  problem  with  the  attributes  of  the 
aerospace  domain  in  mind,  outline  some  of  the  key  requirements  for  PDM  methods,  and  explore 
techniques  that  can  be  used  as  a  foundation  of  PDM  development.  The  problem  of  satisfying  the 
performance  goals  set  for  specific  objective  functions  is  discussed  next,  followed  by  ideas  for  possible 
solutions.  The  ideas,  termed  Dynamic  Constraint  Redesign  (DCR),  have  roots  in  the  fields  of 
Multidisciplinary  Design  Optimization  and  Game  Theory.  Prototype  PDM  and  DCR  algorithms  are  also 
described  and  results  of  their  testing  are  presented. 

15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

20 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Annual  Conference  of  the  Prognostics  and  Health  Management  Society  2012 


•  Mission  effectiveness  maximization  for  unmanned  vehi¬ 
cles 

In  this  paper  we  propose  a  set  of  general  properties  that  prob¬ 
lems  of  interest  to  PDM  researchers  may  have  and  consider 
how  methods  from  the  fields  of  mathematical  optimization, 
multidisciplinary  design  optimization,  and  game  theory  can 
be  utilized  in  the  development  of  PDM  systems  for  aerospace. 
The  discussion  will  primarily  center  on  certain  elements  of 
mission-,  vehicle-,  and  subsystem-level  reasoning,  however 
we  believe  that  the  longer-term  goal  of  PDM  development 
should  be  in  creating  distributed,  yet  comprehensively  inter¬ 
connected  systems  that  support  information  flow  from  the 
highest,  e.g.  fleet,  levels  down  to  the  individual  vehicle  com¬ 
ponents  -  and  back.  To  achieve  that  goal,  four  main  areas  will 
need  to  be  addressed:  (1)  approaches  for  effective  system 
(problem)  decomposition  into  subproblems;  (2)  decision¬ 
making  problem  formulations  for  different  types  of  sub¬ 
problems;  (3)  decision-making  methods  appropriate  for  the 
subproblem  types;  (4)  methods  for  adjusting  problem  for¬ 
mulations  (such  as  constraints)  in  real-time,  if  necessitated 
by  prognostic  predictions  in  off-nominal  situations. 

The  paper  is  organized  around  the  following  objectives: 

•  Provide  some  motivating  examples  for  considering  PDM 
in  the  context  of  aerospace  engineering  (Section  2) 

•  Identify  some  of  the  more  challenging  problems  in 
aerospace  decision-making  and  outline  the  requirements 
such  problems  can  impose  on  PDM  methods  (Section  3) 

•  Provide  the  definitions  used  in  this  work  and  formulate 
the  problem  class  of  interest  from  a  constrained  opti¬ 
mization  point  of  view  (Section  5) 

•  Outline  some  of  the  potential  approaches  to  solving  the 
formulated  class  of  problems  (Section  6) 

•  Discuss  the  type  of  situations  where  a  problem  formu¬ 
lation  may  need  to  be  adjusted  in  real-time  and  suggest 
some  approaches  to  doing  that  (Section  8) 

•  Describe  prototype  algorithms  for  generating  PDM  so¬ 
lutions  and  adjusting  system  constraints  (Section  7  and 
Section  9,  respectively) 

•  Demonstrate  the  algorithms  on  example  scenarios  in¬ 
volving  a  planetary  rover  (Section  1 1 ) 

Additionally,  Section  4  contains  a  review  of  related  prior  ef¬ 
forts,  and  Section  10  describes  the  software/hardware  testbed 
used  in  the  experiments.  The  paper  concludes  with  a  sum¬ 
mary  of  findings  and  an  outline  of  potential  directions  for  fu¬ 
ture  work. 

2.  Motivating  Examples 

Before  we  describe  the  problem  class  of  interest  for  our  cur¬ 
rent  work,  it  may  be  helpful  to  consider  a  few  motivating  ex¬ 
amples.  They  are  chosen  to  illustrate  the  use  cases  listed  in 


the  Introduction.  While  only  three  examples  are  mentioned 
here,  the  field  of  aerospace  has  certainly  no  shortage  of  them. 

2.1.  A  surveying  UAY 

Our  first  example  is  an  electrically-powered  surveying  UAV, 
such  as  the  SWIFT  (Denney  &  Pai,  2012).  The  SWIFT  is  cur¬ 
rently  in  development  at  NASA  Ames  Research  Center.  This 
example  is  meant  to  illustrate  the  first  and  the  third  use  cases, 
that  is  where  PDM  could  be  an  integral  part  of  maintenance 
and  logistics  operations,  as  well  used  for  contingency  man¬ 
agement  if  degradation  of  one  of  the  components  crosses  into 
the  fault  region  during  the  mission. 

Description 

•  The  UAV  performs  surveying  missions  over  a  defined 
area  (e.g.,  earthquake  fault  zone  mapping,  pipeline  mon¬ 
itoring,  or  air  sampling) 

•  Maintenance  for  degrading  or  damaged  components 
needs  to  be  scheduled  and  replacement  parts  need  to  be 
ordered.  Each  of  the  objectives  listed  below  has  an  im¬ 
portance  value  associated  with  it  that  can  change  from 
mission  to  mission  or  even  within  the  same  mission  (if, 
for  instance,  an  in-flight  fault  or  failure  occur). 

Objectives 

•  Maximize  the  number  of  measurements  or  area  coverage 
per  mission 

•  Maximize  vehicle  availability  for  missions 

•  Maximize  safety 

•  Minimize  operational  costs 

Constraints 

•  Airspace  restrictions 

•  Battery  capacity 

•  Component  operating  limits 

•  Return  to  point  of  launch  (desirable) 

2.2.  United  Airlines  Flight  232 

The  second  use  case  (safety  assurance  for  manned  vehicles) 
is  illustrated  with  the  example  of  United  Airlines  Flight  232 
from  Denver  to  Chicago  in  1989  (NTSB,  1989): 

Description 

•  A  fan  disk  in  one  of  the  three  engines  of  the  DC- 10  air¬ 
craft  failed  and  disintegrated 

•  Fan  disk  shrapnel  disabled  the  presumably  redundant  hy¬ 
draulic  controls 

•  The  crew  resorted  to  using  differential  thrust  on  the  re¬ 
maining  two  engines  to  steer  the  aircraft  to  an  emergency 
landing 
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Objectives 

Minimize  injuries  and  fatalities 


3.  Problem  Class  of  Interest  and  Require¬ 
ments 


Constraints 

•  Component  capabilities  and  safety  margins 

•  Location  and  configuration  of  potential  emergency  land¬ 
ing  sites 

•  Availability  of  emergency  services  at  the  sites 

2.3.  Hayabusa  (MUSES-C)  spacecraft 

The  example  of  JAXA’s  Hayabusa  spacecraft  (Kawaguchi, 
Uesugi,  &  Fujiwara,  2003)  illustrates  the  third  use  case  and 
is  interesting  for  a  number  of  reasons.  While  it  became  the 
first  mission  to  return  samples  from  an  asteroid  (Itokawa),  it 
was,  however,  primarily  a  technology  development  mission, 
with  engineering  goals  assigned  point  values  pre-launch  (ta¬ 
ble  1  1 ).  Due  to  long  communication  delays  during  certain 
phases  of  the  mission,  autonomous  operation  was  utilized  ex¬ 
tensively.  Several  problems  jeopardized  mission  objectives, 
however,  and  required  numerous  changes  to  the  mission  plan 
and  the  configuration  of  the  spacecraft. 

Table  1 .  Pre-launch  mission  goals  for  Hayabusa 


Pre-launch  mission  goals  Points 

Operation  of  ion  engines  50 

Operation  of  ion  engines  for  more  than  1000  100 

hours 

Earth  gravity  assist  with  ion  engines  150 

Rendezvous  with  Itokawa  using  autonomous  200 
navigation 

Scientific  observations  of  Itokawa  250 

Touch-down  and  sample  collection  275 

Capsule  recovered  400 

Samples  obtained  for  analysis  500 


Description 

•  A  large  solar  flare  damaged  solar  cells  en  route  to  the 
asteroid 

•  Reduction  in  electrical  power  negatively  affected  the  ef¬ 
ficiency  of  the  ion  engines 

•  Two  reaction  wheels  (X  and  Y)  failed 

•  Release  of  MINERVA  mini-probe  failed 

Objectives 

Maximize  engineering  and  scientific  payoff 

Constraints 

•  Component  capabilities  and  safety  margins 

•  Orbital  mechanics 

•  On-board  propellant  amount 

'reproduced  from  http://www.isas.jaxa.jp/e/ 
enterp/missions/hayabusa/today.shtml 


As  discussed  in  the  Introduction,  we  believe  that  PDM  sys¬ 
tems  will  eventually  need  to  support  decomposition  of  the 
overall  problem  into  smaller  problems  on  different  levels  of 
system  abstraction.  Some  of  these  smaller  problems  could 
potentially  be  solved  with  the  more  traditional  decision¬ 
making  techniques,  such  model-predictive  control  or  partial- 
order  planning.  While  investigating  the  use  of  such  tech¬ 
niques  in  the  context  of  prognostic  decision-making  would 
certainly  be  worthwhile,  in  order  to  narrow  down  the  scope 
of  this  work  we  focus  on  the  class  of  problems  for  which 
decision-making  methods  may  not  yet  be  sufficiently  devel¬ 
oped.  The  examples  in  the  previous  section  (and  others  like 
them)  allow  us  to  outline  the  general  attributes  of  the  class: 

Attributes  of  the  problem  class  of  interest 

•  The  system  under  consideration  is  complex,  consisting 
of  multiple  distinct  components 

•  The  operating  environment  is  complex  and  dynamic 

•  The  system  may  experience  degradation  processes,  due 
to  either  external  or  internal  factors,  that  lead  to  faults 
that  can  be  considered  significant.  Fault  magnitudes  and 
secondary  effects  may  evolve  over  time. 

•  In  case  of  a  fault  (or  faults),  decision  on  mitigation  ac¬ 
tions  required  in  a  limited  amount  of  time 

Requirements 

The  following  high-level  requirements  could  then  be  pro¬ 
posed  on  the  PDM  methods  for  solving  such  problems: 

1.  Should  be  general  and  adaptable 

It  may  not  be  possible  to  define  even  partial  solutions  a 
priory  for  specific  combinations  of  system  state,  environ¬ 
mental  conditions,  constraints,  and  objectives. 

2.  Should  utilize  prognostic  information,  if  available 
While  offering  the  benefits  of  an  insight  into  a  fu¬ 
ture  system  state,  incorporation  of  prognostic  capabil¬ 
ity  may  also  result  in  a  substantial  increase  in  compu¬ 
tational  complexity.  In  practice,  obtaining  prognostic  in¬ 
formation  could  require  execution  of  a  computationally- 
expensive  simulation  for  each  potential  solution. 

3.  Shall  accommodate  uncertainty  and  inconsistency  in 
input  data 

Input  data  available  in  aerospace  applications  often  suf¬ 
fers  from  noise,  drop-outs,  uncertainty  of  accuracy,  and 
other  issues. 

4.  Should  support  system  decomposition 

The  ability  to  account  for  condition,  objectives,  and  con¬ 
straints  of  individual  subsystems  and  components  can  re¬ 
sult  in  increased  solution  quality.  Carrying  out  decision¬ 
making  in  a  distributed  fashion  can  also  be  beneficial 
from  the  performance  point  of  view. 
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5.  Should  not  depend  on  knowing  objective  function 
properties 

Objective  functions  (defined  in  Section  5)  may  not  be 
guaranteed  to  be  convex  or  differentiable,  for  example, 
thus  ’blackbox’  reasoning  techniques  may  need  to  be  uti¬ 
lized. 

6.  Shall  be  time-boundable 

In  most  cases  a  valid  solution  will  be  required  within  a 
prescribed  period  of  time.  In  some  circumstances  the 
system  will  also  be  required  to  be  interruptable,  i.e.  ca¬ 
pable  of  supplying  a  valid  solution  even  if  the  decision 
making  process  is  interrupted  before  the  originally  spec¬ 
ified  time  interval  has  elapsed. 

7.  Shall  support  multi-action  solution  generation 

In  addition  to  being  able  to  generate  single-action  so¬ 
lutions,  such  as  setting  controller  gain  values,  the  sys¬ 
tem  needs  to  be  able  to  generate  multi-action  solution 
sequences. 

8.  Should  support  multiple  objectives 

This  requirement  is  motivated  by  scenarios  where,  for 
instance,  failure  risk  is  to  be  minimized  while  mission 
payoff  is  to  be  maximized.  Also  applicable  to  cases 
where  the  condition  of  multiple  subsystems  or  compo¬ 
nents  needs  to  be  taken  into  account. 

A  subset  of  these  requirements  (//ig/z-dimensional,  Expensive 
(computationally),  Blackbox)  is  sometimes  referred  to  in  the 
literature  as  HEB  (Shan  &  Wang,  2009). 

4.  Prior  Work 

Before  moving  on  to  describing  the  initial  approach  we  chose 
to  take  in  developing  decision-making  methods,  we  will  re¬ 
view  some  of  the  prior  related  efforts.  The  research  efforts 
described  in  this  section  were  chosen  from  several  different 
fields  where  prognostic-style  information  is  used  for  system 
action  determination  and  we  believe  them  to  be  representative 
of  the  current  state  of  the  art. 

4.1.  Prognostics-enhanced  control 

Pereira  et  al  propose  a  Model  Predictive  Control  (MPC)  ap¬ 
proach  for  actuators  that  distributes  control  effort  among  sev¬ 
eral  redundant  units  (Pereira,  Galvao,  &  Yoneyama,  2010). 
Redistribution  is  performed  based  on  prognostic  information 
on  their  deterioration.  A  degradation  model  of  the  plant  is 
used  that  represents  damage  accumulation  to  be  proportional 
to  the  exerted  control  effort  u  and  its  variation  Am.  Bogdanov 
et  al  (Bogdanov,  Chiu,  Gokdere,  &  Vian,  2006)  investigate 
coupling  of  a  prognostic  lifetime  model  for  servo  motors  with 
a  family  of  LQR  controllers.  External  load  disturbances  on 
the  servo  are  assumed  to  be  stochastic. 

In  (D.  W.  Brown,  Georgoulas,  &  Bole,  2009)  Brown  et  al 
report  on  prognostics-enhanced  fault-tolerant  controller  that 


trades  off  performance  for  RUL.  The  controller  is  based  on 
MPC  principles,  with  control  boundaries  for  tmjL  corre¬ 
sponding  to  a  particular  input  urul  used  as  soft  cost  con¬ 
straints.  The  work  is  extended  with  error  analysis  and  esti¬ 
mation  of  uncertainty  bounds  for  long-term  RUL  predictions 
in  (D.  W.  Brown  &  Vachtsevanos,  2011).  In  (Bole,  Tang, 
Goebel,  &  Vachtsevanos,  2011)  Bole  et  al  also  study  opti¬ 
mal  load  allocation  given  prognostic  data  about  fault  magni¬ 
tude  growth  (including  uncertainty  bounds  on  the  prediction). 
The  concept  of  Value  at  Risk  (VaR),  coming  from  the  field 
of  finance,  is  used  as  the  key  performance  metric.  The  case 
study  used  in  the  experiments  is  an  unmanned  ground  vehicle 
(UGV)  that  experiences  winding  insulation  degradation  in  the 
drive  motors  due  to  thermal  stress. 

4.2.  Post-prognostic  decision  support  and  condition-based 
maintenance 

Iyer  et  al  use  the  term  post-prognostic  decision  support  to 
describe  their  framework  for  Pareto  set  generation  and  inter¬ 
active  expression  of  user  preferences  throughout  the  process 
(Iyer,  Goebel,  &  Bonissone,  2006).  The  approach  is  illus¬ 
trated  with  a  logistics  planning  example,  where  mission  assets 
need  to  be  allocated  based  on  the  estimated  state  of  health  of 
an  asset  and  the  projected  availability  of  replacement  parts. 
An  exhaustive  search  technique  was  used  as  the  optimization 
method  in  the  experiments,  with  the  intention  to  replace  it 
with  a  genetic  algorithm  in  the  future. 

In  (Haddad,  Sandborn,  &  Pecht,  2011b)  and  (Haddad,  Sand- 
born,  &  Pecht,  2011a)  Haddad  et  al  present  a  prognostics- 
enabled  optimization  model  for  maximizing  availability  of  an 
offshore  wind  farm.  The  model  is  based  on  Real  Options 
Analysis  (ROA)  and  stochastic  dynamic  programming.  The 
concept  of  ROA  also  comes  from  the  field  of  finance  and 
refers  to  analysis  over  either  real,  tangible  assets  or  opportuni¬ 
ties  for  cost  avoidance.  The  method  is  illustrated  with  an  ex¬ 
ample  where  an  optimum  subset  of  turbines  to  be  maintained 
needs  to  be  found,  given  the  information  on  their  degradation, 
availability  requirements,  and  cost  constraints. 

4.3.  Automated  contingency  management 

The  work  done  by  Tang,  Edwards,  Orchard,  and  others  on 
Automated  Contingency  Management  (ACM)  includes  ele¬ 
ments  of  prognostics-enhanced  control,  but  also  extends  to 
prognostic  mission  replanning  (Tang  et  al.,  2007;  Edwards, 
Orchard,  Tang,  Goebel,  &  Vachtsevanos,  2010;  Tang,  Het- 
tler,  Zhang,  &  Decastro,  2011).  Diagnostic  and  prognostic 
algorithms  for  various  component  types  were  developed  and 
integrated  into  a  prototype  decision-making  framework  for  an 
unmanned  ground  vehicle  (UGV).  RUL  estimates  were  used 
either  as  a  constraint  or  as  an  additional  element  in  the  cost 
function  of  the  path-planning  algorithm.  A  Field  Z)*-style 
search  routine  was  used  for  receding  horizon  planning.  Meth- 
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ods  for  estimating  and  managing  process  uncertainty  were 
also  developed. 

5.  Definitions  and  Problem  Formulation 

In  this  section  we  provide  the  definitions  of  the  concepts  used 
in  the  rest  of  this  work  and  represent  the  problem  class  de¬ 
scribed  in  Section  3  in  terms  of  Partially  Observable  Markov 
Decision  Processes.  The  definitions  generally  follow  the  con¬ 
ventions  found  in  the  contemporary  prognostic  health  man¬ 
agement,  optimization,  game-theoretic,  and  decision-making 
literature,  with  some  exceptions  as  noted.  In  combining  nota¬ 
tion  conventions  used  in  several  different  fields,  some  of  the 
terms  had  to  be  assigned  symbols  that  may  not  be  typical  for 
them. 


is  defined,  along  with  a  set  of  constraints  on  them: 

C(X)  =  {Cl(X)  >  0,  c2(X)  >  0,  ...C]C\{X)  >  0}. 

5.3.  Decision  Variables 

A  set  of  decision  variables  U,  the  values  of  which  can  be 
controlled,  is  defined  as  well: 

U  =  {ui,W2, 

Each  U{  G  U.i  =  1 ,  2,  ...\U\,  is  coupled  with  a  domain  Di, 
over  which  it  is  defined.  The  following  inequality  and  equal¬ 
ity  constraint  sets  are  specified  for  the  decision  variables: 

G{U)  ={9l(U )  >  0l92(U)  >  0,  ...g\Q\(U)  >  0}, 
H{U)  ={hx{U)  =  0,  h2(U)  =  0,  ...h\H\(U)  =  0}. 


5.1.  System 

The  term  system  in  this  set  of  definitions  is  used  in  a  simi¬ 
lar  sense  to  the  term  plant  from  control  theory.  It  can  refer 
to  a  single  component  or  the  entire  vehicle,  depending  on 
the  context.  The  system  is  modeled  as  a  constrained,  fac¬ 
tored,  discrete-time  Partially  Observable  Markov  Decision 
Process  (POMDP).  POMDP  (or,  in  some  cases,  the  more 
traditional  Markov  Decision  Process),  is  often  used  to  rep¬ 
resent  decision-making  under  uncertainty  and  with  incom¬ 
plete  information  about  the  system  (Peek,  1998;  Malikopou- 
los,  2007;  Bryce  &  Cushing,  2007;  Boularias,  2010;  Bole, 
2012).  We  define  POMDP  as  a  tuple  { S ,  A,  Z,  b0,  T,  O,  R}, 
with  the  components  explained  below: 


S 

A 

Z 

bo 

T-.SxA ->  P{S) 


O-.AxS ->■  P(Z) 


R:SxA^D\ 


A  finite  set  of  par¬ 
tially  observable  states, 
S  =  {si,  s2,  ...S|g| } 

A  finite  set  of  possible  actions, 
A  =  {ai,a2,  •••a|J4|} 

A  finite  set  of  observations, 
Z  =  {z1,z2,...zjz\} 

An  initial  set  of  beliefs 

A  state  transition  function, 
for  each  state  and  action  giv¬ 
ing  a  probability  distribution 
over  next  states,  T(s,  a,  s')  = 
p(s'|s,a) 

An  observation  probability 
function  (sensor  model), 
0(z,a,  s')  =p(z'\s,a) 

A  reward  function 


5.2.  State  Variables 

Additionally,  a  vector  of  state  variables 

X  =  {xi,x2,  ...x\x\} 


5.4.  Decision  Making 

A  policy  7 r  is  defined  as  a  function  mapping  POMDP  states  to 
actions,  7r  :  S  — >  A,  with  II  defined  as  the  set  of  all  possible 
policies. 

Decision-making  in  the  context  of  this  work  is  defined  as 
the  process  of  determining  a  policy  7r  and/or  the  values  of 
decision  variables  in  U.  For  policies,  a  decision  6 (tv)  = 
{ai,  a2,  ...an}  is  defined  as  the  solution  to  the  POMDP  (cor¬ 
responding  to  a  policy  7 r)  and  is  described  as  an  ordered  set 
of  actions. 

A  feasible  or  satisfactory  policy  7rs  is  defined  as  a  policy  for 
which  <5(7ts)  is  such  that  no  C(X)  are  violated  in  any  of  the 
states  achieved.  11/  is  the  set  of  all  feasible  policies. 

If,  additionally,  objective  functions  and  an  objective  vector 

are  defined: 

/M  =  {/i(7r),/2(7r),.../|/-|(7r)}, 

then  the  optimal  policy  can  be  defined  as: 

7To  =  7T/  :  minf(nf), 

where  every  objective  function  is  reaching  its  minimum  (best) 
value.  Note  that  a  general  assumption  of  multiple  objectives 
and,  therefore,  multiple  objective  functions  is  made. 

Finding  this  strictly  optimal  (often  called  ideal  or  utopian) 
policy  in  practice  is  usually  not  possible.  Therefore  the  con¬ 
cept  of  a  compromise  policy  that  achieves  good  results  for 
the  entire  objective  vector,  while  possibly  not  minimizing  any 
particular  objective  function,  is  utilized.  This  concept,  known 
as  Pareto  optimality,  is  used  widely  in  economics,  operations 
research,  and  engineering. 

A  Pareto  optimal  policy  is  defined  as  a  policy  that  is 
not  dominated  by  any  other  policy  in  II.  A  vector 
a  =  {ai,a2,  ...,a.k}  is  defined  to  dominate  vector  (3  = 
{/3i,  P2,  ■■■,  /3fe}  if  and  only  if  it  is  partially  less  than  /3: 

(V*  G  [1,2  ,...k],ai  <  Pi)  A  (3  j  G  [1,2,  ...k]  :  ay-  <  /3j). 
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Dominance  of  a  over  3  is  conventionally  denoted  as  a  -<  3. 
Policy  7T*  £  II  is  then  Pareto  optimal  if  and  only  if: 

(Vi  =  1,  2, ... K ,  -niTT7  £  n  :  ttV  <  /<( tt*)) 

A(3j  =  1,2,  ...K  :  fj( tt)  <  /j(tt*)). 

7t*  is  rarely  unique,  and,  therefore,  a  Pareto  set  (also  known 

as  Pareto  front  (or  Pareto  frontier))  is  defined  as: 

n*  =  {tt  £  n|-i37r/  £  n,7T/  A  7r }. 


A  representation  of  a  Pareto  front  for  two  objective  functions 
is  provided  on  Figure  1 .  Note,  in  particular,  that  a  Pareto  front 
should  not  be  assumed  to  be  continuous  or  convex. 


5.5.  System  Degradation  and  State  of  Health 

Degradation  is  defined  as  the  process  of  reduction  in  system 
performance  through  time  with  respect  to  some  criterion  (Fig¬ 
ure  2).  Degradation  can  be  reversible  (e.g.  through  mainte¬ 
nance  or  self-healing)  or  irreversible.  State  of  Health  (SOH) 
is  a  generalized  and  normalized  way  of  representing  degrada¬ 
tion,  usually  defined  in  the  [0, 1]  domain  ( SOH  =  1  corre¬ 
sponds  to  full  health  and  SOH  =  0  represents  an  inoperable 
system),  rj  is  used  to  denote  the  SOH  ( h  is  used  in  some  of 
the  references  listed,  but  is  reserved  for  the  decision  variables 
equality  constraints  in  this  work).  rj  is  uniformly  discretized 
and  included  as  a  component  of  the  state  vector. 

Fault 

O fault  (2f )  6  C'(X)  is  a  subset  of  the  state  constraints  se¬ 
lected  to  indicate  a  significant  deviation  from  nominal  behav¬ 
ior,  i.e.  a  fault.  A  fault  occurs  when  any  of  the  constraints 
in  C fault  (X)  is  violated.  We  expect  fault  constraints  to  be 
defined  on  SOH  in  most  cases,  however  this  definition  allows 
for  constraints  on  other  state  variables  to  be  used  to  indicate 
a  fault  (such  as  energy  depletion). 


Failure 

Similarly,  a  C fauure{X)  £  C(X)  subset  is  defined  to  in¬ 
dicate  deviations  from  the  nominal  behavior  that  render  the 
system  functionally  unusable. 

5.6.  Prognostics 

In  this  work  prognostics  is  defined  as  information  on  pro¬ 
jected  change  in  plant  behavior  through  time,  e.g.  due  to  wear 
or  degradation  (Figure  2).  In  contrast,  a  commonly  used  def¬ 
inition  states  prediction  of  the  Remaining  Useful  Life  (RUL) 
and  End  of  Life  (EOL)  as  the  goal  of  prognostics  (Daigle  & 
Goebel,  2010;  Saxena  et  al.,  2008).  We  believe  that  the  latter 
definition  may  prove  to  be  less  convenient  for  the  purposes 
of  PDM,  as  obtaining  intermediate  degradation  predictions 
could  be  important.  Decisions  on  how  to  minimize  degrada¬ 
tion  could  then  be  made  based  upon  such  predictions.  For 
the  modeling  approach  chosen,  incorporating  prognostic  in¬ 
formation  into  the  decision  process  amounts  to  populating  the 
POMDP  with  state  and  transition  information. 

The  following  assumptions  are  made  for  the  above  definition: 

•  A  prognostic  estimate  is  defined  for  a  specific  instance  in 
time,  given  the  information  up  to  that  moment 

•  Prognosis  depends  on  information  regarding  the  future 
operating  conditions 

•  Uncertainty  in  system  modeling,  outputs,  observations, 
and  current/future  operating  conditions  is  admissible. 

6.  Selecting  a  Policy  Generation  Approach 

Having  defined  the  requirements  on  PDM  methods  for  the 
problem  class  of  interest  and  described  our  modeling  ap¬ 
proach,  we  now  turn  to  considering  the  suitable  policy  gen¬ 
eration  techniques.  Such  techniques  are  generally  classified 
into  satisficing  or  optimizing  (Simon,  1956),  although  alter¬ 
native  taxonomies  exist  as  well.  The  goal  of  the  optimiz¬ 
ing  techniques  is  to  find  solutions  on  the  Pareto  frontier  or 
as  close  to  it  as  possible.  The  latter  only  attempt  to  find  fea¬ 
sible  solutions.  Satisficing  techniques  are  used  extensively  in 
many  types  of  applications  and  often  have  the  advantage  of 
being  computationally  inexpensive.  They  also  generally  lend 
themselves  well  to  validation  and  verification. 
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In  this  work,  however,  we  chose  to  formulate  the  decision¬ 
making  problem  from  the  optimization  point  of  view  -  primar¬ 
ily  because  we  believe  that  this  will  allow  us  to  take  greater 
advantage  of  prognostic  information.  In  the  rest  of  the  section 
we  comment  only  briefly  on  the  major  types  of  optimization 
methods  with  respect  to  the  requirements  in  Section  3.  As 
we  do  not  aim  to  provide  a  comprehensive  survey  of  modern 
optimization  techniques,  interested  readers  can  refer  to  (Das 
&  Chakrabarti,  2005),  (Shan  &  Wang,  2009),  or  (Rao,  2009), 
to  list  a  few. 

Exhaustive  search  (or  brute-force  methods)  are  generally 
straightforward  to  implement  and  are  capable  of  generating 
exact  Pareto  sets.  Scalability  is  the  main  issue  with  this 
type  of  methods,  as  they  quickly  become  computationally  in¬ 
tractable.  They  can,  however,  be  useful  for  verifying  perfor¬ 
mance  of  other  optimization  methods  on  simple  problems. 

Gradient  Descent,  Hill  Climbing  and  similar  local  search 
methods  are  not  guaranteed  to  find  global  optima.  Gradient 
Descent  methods  also  generally  require  objective  functions 
to  be  defined  and  differentiable  over  the  entire  search  space. 
Linear  Programming,  Constraint  Programming,  Newton, 
and  Quasi-Newton  methods  require  knowledge  of  objective 
function  properties  as  well. 

Dynamic  Programming  (DP)  methods  are  widely  used  for 
policy  generation.  The  main  downsides  of  traditional  DP  for¬ 
mulations  are  that  for  multi-objective  problems  a  single  com¬ 
posite  objective  function  needs  to  be  constructed,  i.e.  a  Pareto 
set  is  not  produced,  and  that  system  decomposition  can  be 
difficult  to  accomplish.  Some  DP-based  methods  have  been 
developed,  however,  that  attempt  to  circumvent  both  of  these 
issues  (see  (Hussein  &  Abo-Sinna,  1993;  Driessen  &  Kwok, 
1998;  Liao,  2002)).  Additionally,  with  factored  state  spaces 
being  exponential  in  size  with  the  number  of  state  variables, 
exact  DP  methods  become  unsuitable  for  large-size  problems. 
In  certain  applications,  approximate  DP  methods  have  been 
used  (Kveton,  Hauskrecht,  &  Guestrin,  2006). 

Stochastic  methods  (such  as  Simulated  Annealing,  Quantum 
Annealing,  Metropolis-Hastings,  Cross-Entropy,  or  Probabil¬ 
ity  Collectives)  generally  satisfy  the  requirements  we  pro¬ 
posed  in  Section  3.  None  of  them  guarantee  optimality;  they 
do,  on  the  other  hand,  posses  the  anytime  property  (can  be 
interrupted  at  any  time  and  still  return  a  valid  result),  can  be 
used  with  blackbox  objective  functions,  and  can  accommo¬ 
date  system  decomposition. 

Genetic  algorithms  (often  classified  together  with  stochas¬ 
tic  methods)  also  satisfy  the  proposed  requirements.  In  such 
algorithms  a  prototype  (candidate)  solution  is  described  as 
an  individual  member  of  a  population.  Biologically-inspired 
operators  (selection,  reproduction,  mutation,  and  others)  are 
used,  guided  by  fitness  functions.  Genetic  algorithms  produce 


a  Pareto  front  approximation  in  each  iteration  and,  therefore, 
are  also  anytime. 

For  this  phase  of  the  work  we,  ultimately,  chose  to  develop 
a  policy  generation  method  based  on  Probability  Collectives. 
In  the  future,  we  also  plan  investigate  policy  generation  via 
genetic  algorithms.  A  method  based  on  Simulated  Annealing 
(SA)  was  used  in  the  prototype  constraint  redesign  framework 
(Section  9). 

7.  Policy  Generation  Algorithm  Development 

The  current  policy  optimization  algorithm  is  referred  to  as 
Probabilistic  Policy  Generator  (PPG).  With  its  roots  in  the 
work  on  Probability  Collectives  (PC)  (Wolpert,  Strauss,  & 
Rajnarayan,  2006),  it  belongs  to  the  class  of  blackbox  opti¬ 
mization  methods.  Such  methods  have  the  goal  of  finding  a 
value  x  €  X  that  minimizes  an  associated  value  F(x).  X  is 
an  optimization  space  (not  to  be  mistaken  for  the  X  used  to 
denote  POMDP  state  vectors  in  other  parts  of  this  work)  and 
F(x)  could  be  an  objective  or  a  utility  function.  The  follow¬ 
ing  process  is  repeated  iteratively:  (1)  an  is  chosen  from  X ; 
(2)  statistical  information  about  F(x)  is  updated;  (3)  the  next 
value  of  x  is  chosen  using  the  (x,  F( x))  pairs  found  up  to  that 
point. 

The  main  difference  between  the  conventional  blackbox  ap¬ 
proaches  and  PC  is  that  while  the  former  operate  directly  on 
the  values  of  x  (by  constructing  a  map  M  from  a  subset  of 
{(x,  .F(x))}  to  the  next  sample  x),  the  latter  works  with  prob¬ 
ability  distributions  over  x.  That  is  done  by  specifying  a  map 
m  from  a  subset  of  {(x,  F(x))}  to  the  next  distribution  over 
X,  P(X).  That  distribution  is  then  sampled  to  select  the  next 
value  of  x.  The  goal  of  conventional  blackbox  approaches 
is  to  design  M  in  such  a  way  as  to  increase  the  likelihood 
of  finding  values  of  x  corresponding  to  the  small  values  of 
F(x).  In  the  PC  case,  the  goal  for  designing  m  is  to  gen¬ 
erate  P{X)  peaked  around  the  small  values  of  F(x).  This 
can  be  more  formally  described,  for  example,  in  terms  of  the 
expected  value: 

find  min  J  F(x)p(x)dx ,  s.t. 

x  E  X,  J  p(x)dx  =  1  ,p(x)  >  0  Vx, 

with  the  integrals  are  replaced  by  sums  for  discrete  distribu¬ 
tions. 

There  are  a  number  of  advantages  to  working  with  distribu¬ 
tions  over  X  rather  than  working  with  X  directly.  One  is  that 
the  same  algorithm  could,  in  most  cases,  be  used  for  different 
types  of  space  X  without  significant  modifications.  Another 
is  that  P  generated  by  a  PC-based  algorithm  will  be  peaked  in 
some  dimensions,  while  being  broad  in  others,  thus  supplying 
sensitivity  information  on  the  importance  of  getting  better  es¬ 
timates  for  the  values  of  those  dimensions.  A  PC -based  algo- 
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rithm  can  also  be  used  to  combine  and,  ideally,  improve  upon 
solutions  produced  by  other  optimization  algorithms.  To  do 
that,  P  is  initialized  to  a  set  of  broad  peaks,  each  centered  on  a 
solution  generated  by  the  other  algorithm(s).  As  P  is  updated, 
the  shapes  of  the  peaks  are  defined  further  and  some  of  them 
become  merged,  producing  combined  solutions.  Finally,  the 
approach  can  be  extended  to  multi-component  vectors  x  in  a 
relatively  straightforward  fashion. 


The  earlier  versions  of  the  PPG  algorithm  were  described 
in  (Balaban  et  al.,  2011;  Narasimhan  et  al.,  2012).  It  uses 
’look-ahead’  sampling  to  aggregate  information  about  policy 
options,  gradually  increasing  the  probability  of  choosing  the 
more  optimal  solutions.  Its  input  parameters  are  the  follow¬ 
ing: 

A  valid  actions  set 

/(• 7r)  objective  function  vector 

v  objective  preference  vector 

Gt  inequality  constraint  set 

Ht  equality  constraint  set 

l  maximum  policy  length 

Ni  number  of  utility  function  calls  allocated  to  the 

first  phase  of  the  algorithm 

N2  number  of  utility  function  calls  allocated  to  the 
second  phase  of  the  algorithm 

M  number  of  stages 

Execution  time  is  controlled  by  specifying  /,  Ni,  N2,  and  M 
(further  explained  below).  The  algorithm  (see  Algorithm  1) 
operates  in  the  following  manner: 


Initialization  (lines  2-5) 

A  set  of  partial  policies,  II',  is  initialized  with  a  single  mem¬ 
ber,  7Tq  .  For  simplicity,  a  partial  policy  it'  is  defined  as  the  set 
actions  mapped  to  the  first  several  states  achieved  for  a  deci¬ 
sion  5.  For  instance,  {«i .  a 2 }  is  a  partial  policy  correspond¬ 
ing  to  the  decision  {<24,  02,  <23, 04,  <25}.  The  probability  of  ttq 
achieving  maximum  utility  (p( 7r0))  is  set  to  1.  Finally,  first 
phase  utility  function  call  quotas  are  allocated  per  stage  (for  a 
total  of  Ni),  with  increasing  stage  numbers  corresponding  to 
progressively  longer  policy  roots.  The  allocation  is  currently 
done  using  a  cubic  function,  with  the  earlier  stages  receiving 
a  greater  proportion  of  the  total  number. 


Partial  policy  extension  (lines  7-15) 

The  first  phase  of  the  algorithm  is  executed  for  M  number  of 
stages.  In  each  iteration  the  partial  policies  in  IT,  generated 
during  the  preceding  stages,  are  extended  and  the  probability 
of  them  resulting  in  an  optimal  solution  is  estimated.  In  order 
to  extend  the  partial  policies,  sets  of  feasible  follow-on  ac¬ 
tions  are  determined  first.  In  the  example  problem  described 
in  Section  11,  the  rover  should  visit  each  of  its  target  loca¬ 
tions  once  at  the  most.  Thus,  if  a  maximum  of  five  locations 


maximum  is  to  be  visited,  partial  policy  n'  =  {<24 .  a  2 }  (move 
to  node  1,  then  to  node  2)  has  A ^  =  {<23, 04, 05}  as  the  set  of 
possible  follow-on  actions.  The  valid  one-action  extensions 
are  then  {<24,  <22,  <23},  {<21,  (22,  <24},  and  {04,02,05}.  These 
offsping  partial  policies  replace  the  parent  partial  policy  (7r') 
in  II'  and  split  its  probability  value  evenly. 

Partial  policy  probability  estimation  (lines  17-22) 

The  probability  of  each  partial  policy  in  updated  11'  achiev¬ 
ing  maximum  utility  is  estimated  next.  To  achieve  that,  II'  is 
sampled  randomly  according  to  the  prior  distribution.  Each 
sample  n'  is  used  to  obtain  a  decision  of  the  maximum  length 
l,  with  valid  completion  actions  selected  from  A ^ .  The  pol¬ 
icy  pi  corresponding  to  the  sample  decision  is  then  evalu¬ 
ated  with  respect  to  the  objective  function  vector  /  and  the 
constraint  set  C(X).  Note  that  in  order  to  satisfy  the  con¬ 
straints,  the  extended  decision  may  be  truncated  short  of  the 
maximum  length.  For  instance,  if  5  =  {<24,  <22,  (23,  <24, 05} 
does  not  satisfy  one  or  more  of  system  constraints,  while 
6  =  {(24,  a 2 ,  (23,  a,\ }  does,  then  the  latter  is  picked.  The  util¬ 
ity  value  u(tt)  is  computed  (currently  by  using  the  preference 
vector  v)  and  the  posterior  probability  of  ir'  is  adjusted  after 
the  sampling  process  is  complete.  A  Normalized  Root  Mean 
Squared  Error  (NRMSE)  metric  is  used  to  aggregate  informa¬ 
tion  on  how  well  w'  is  performing  relative  to  the  maximum 
utility  value  seen  so  far: 


E  iPmax  -  U{n))2 

tt{tlmax  tlmiu)2 

where  n  is  the  number  of  sample  decisions  constructed  for  7 r', 
and  umin  and  umax  are  the  minimum  and  the  maximum  val¬ 
ues  of  the  utility  function  observed  so  far,  respectively.  The 
metric  is  the  same  as  a  normalized  Lp  metric  (Coello,  Lam- 
ont,  &  Veldhuizen,  2007),  with  p  =  2. 

Monte  Carlo  simulation  on  II'  (lines  27-32) 

Once  the  probability  distribution  P(II')  is  shaped,  a  Monte 
Carlo  simulation  is  run  for  N2  sample  policies.  Policy  roots 
are  picked  according  to  the  distribution,  extended  to  the  max¬ 
imum  length  satisfying  C(X)  and  evaluated  with  respect  to 

f. 

Solution  set  filtering  (lines  36-38) 

Finally,  the  solution  set  II*  is  reduced  using  a  variant  of  the 
bounded  objective  method  and  according  to  the  priority  vec¬ 
tor  v.  The  objective  functions  in  f(i r)  are  sorted  in  descend¬ 
ing  order,  based  on  the  values  in  v,  |tj  =  K.  II*  is  then 
reduced  to  II  ^ ,  where  the  highest-ranked  objective  is  maxi¬ 
mized.  H}i  is  subsequently  reduced  to  II}2  and  so  on,  until 
either  |II}fe  |  =  1  (fc  =  1, 2,  ...K)  or  k  =  K. 
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Algorithm  1  PPG 

l:  procedure  ppg(A,  /(tt),  v,  l ,  Ni,N2,  M) 

2:  7Tq  <r-  {ao}  >  null  action  to  assume  the  initial  state 

3:  1 1'  -t—  {7To}  >  set  of  all  policy  roots 

4:  p(n q)  =  1  >  assign  initial  probability 

5:  Ns  •<—  allocateUtilityFunctionCalls(Ni) 

6:  for  stage  <-  1,  M  do 

7:  for  all  7r'  in  II'  do 

8:  A^'.  <—  getValidActions^}) 

9:  >  generate  all  possible  one-action  extensions 

10:  11^..  •<—  extendPolicyRootfjr},  Av’.) 

H  n' ’  in'  n',i 

12:  for  all  7 Tj  in  Ii',f;„,  do 

!3:  P{nj)^P(K)/  lnneu,l 

14:  end  for 

15:  end  for 

16:  >  update  P(n') 

17:  for  i  «—  1,  Ns(stage )  do 

18:  7r'  •<—  getRandomSample(H' ,  P(n')) 

19:  7 rs  <r-  extendPolicy(Tr's,l ) 

20:  /(7rs)  •<—  evaluateP olicy(TT s ,  f(Tr),H,  G) 

21:  us  <—  calculateUtility(f(Trs),v) 

22:  p(7rs)  A-  updateRootProbability(n's,  us) 

23:  end  for 

24:  end  for 

25:  n'  A-  { lZew } 

26:  fl me  t—  0 

27:  >  Monte  Carlo  simulation  on  II' 

28:  for  i  <—  1,N2  do 

29:  7r^c  <—  g et Random Sample(Jl' ,  P(II')) 

30:  7 rmc  «—  extendPolicy(Tr's,  l ) 

31:  /(7rs)  -t—  evaluatePolicy(TrSl  f(n),  H,G) 

32:  nmc  {n  me  } 

33:  end  for 

34:  n*  <—  nmc 

35:  >  Filter  policy  set 

36:  /(tt)  sorted  sortDescending{f  (7r) ,u) 

37:  n=l 

38:  while  (|II*|  >  1  )&(fc  <  IC)  do 

39:  for  all  7r  in  II*  do 

40:  II*  -t—  {  all  7r  in  n*|/fc(7r)  is  max} 

41:  end  for 

42:  end  while 

43:  end  procedure 


8.  Dynamic  Constraint  Redesign 

The  preceding  sections  of  the  paper  concentrated  on  the  in¬ 
corporation  of  prognostic  information  into  the  decision  mak¬ 
ing  process  and  the  selection  of  appropriate  policy  optimiza¬ 
tion  methods.  The  outcome  of  a  multi-objective  optimization 
is  a  Pareto  set  of  policies  II*.  There  are  three  ’’goldilocks” 
possibilities  with  respect  to  the  size  of  II* : 

1.  The  size  is  acceptable,  i.e.  1  <  |II*|  <  N,  where  N  is 
the  maximum  number  of  candidate  policies  that  can  be 
practically  down-selected  by  inspection,  using  heuristic 
methods,  or  by  some  other  means. 


2.  The  size  is  too  large,  i.e.  |II*|  >  N.  In  this  case  the 
set  can  be  reduced  either  through  interaction  with  a  hu¬ 
man  expert  (as  described  earlier  in  (Iyer  et  ah,  2006))  or 
through  an  autonomous  process  that  adds/tightens  con¬ 
straints  in  C(X)  and  re-runs  the  optimization  until  a  II* 
of  a  desired  size  is  achieved. 

3.  No  feasible  solutions  exist,  i.e.  |II*|  =  0.  In  this  case  the 
original  constraints  in  C(X)  may  need  to  be  relaxed  or 
eliminated. 

The  second  case  is  an  interesting  research  area  that  we  hope  to 
explore  further  in  the  future.  In  the  current  work,  however,  we 
focus  on  the  third  case.  In  addition  to  the  absence  of  feasible 
solutions,  however,  there  could  be  another  reason  why  II* 
may  not  be  suitable  -  which  is  the  subject  of  the  next  section. 

8.1.  Performance  goals  satisfaction 

Consider  the  case  where,  in  addition  to  constraints  in  C(X), 
constraints  (or,  rather,  performance  goals)  were  also  defined 
for  some  or  all  of  the  elements  of  /,  as  is  done  in  Goal  Pro¬ 
gramming  (Tamiz,  lones,  &  Romero,  1998),  for  instance: 

r (/)  =  {7i(/)  >  0,72 (/)  >  o,  •■•7|/)| (/)  >  0}. 

An  example  of  a  Pareto  set  not  satisfying  some  of  the  perfor¬ 
mance  goals  in  T(/)  is  illustrated  on  Figure  3. 


Figure  3.  An  example  of  a  Pareto  set  not  satisfying  a  perfor¬ 
mance  goal  (7/j). 


If  no  acceptable  solutions  are  found  during  the  optimization 
process  and  if  the  performance  goals  are  considered  to  be  of 
high  enough  importance,  then  constraints  in  C(X)  may  need 
to  be  changed  or  eliminated. 

For  convenience,  in  this  paper  we  refer  to  the  process  of  mod¬ 
ifying  system  constraints  as  Dynamic  Constraint  Redesign 
(DCR).  In  the  context  of  an  aerospace  vehicle,  DCR  could 
mean  knowingly  damaging  a  component  or  a  subsystem  be¬ 
yond  repair  if  that  means  saving  the  overall  vehicle.  Only  sys¬ 
tem  constraints  will  be  considered  for  the  purpose  of  this  dis- 


9 


Annual  Conference  of  the  Prognostics  and  Health  Management  Society  2012 


cussion,  however  one  can  also  envision  eliminating  or  relax¬ 
ing  external  constraints,  such  as  airspace  restrictions  or  flight 
separation  distances. 

DCR  can  also  be  thought  of  as  redesigning  the  vehicle  ”on  the 
fly”,  by  changing  its  performance  characteristics  to  the  out¬ 
side  of  the  known  envelope  -  while  simultaneously  searching 
for  a  Pareto  optimal  policy  to  best  utilize  the  modifications 
in  the  current  mission.  Some  of  the  same  issues  arise  as  dur¬ 
ing  the  original  design,  e.g.  subsystem  compatibility  assur¬ 
ance,  choice  of  design  variables,  and  design  variable  sensi¬ 
tivity  analysis.  In  the  last  few  decades  the  field  of  Multi¬ 
disciplinary  Design  Optimization  (MDO)  has  been  developed 
to  address  these  and  other  issues  during  the  initial  design  of 
complex  systems.  We  believe  that  some  of  the  techniques 
from  MDO  community  could  be  beneficial  in  development  of 
DCR  as  well. 

8.2.  Multidisciplinary  Design  Optimization  (MDO) 

In  this  section  we  briefly  review  some  of  the  most  popular 
MDO  approaches  and  comment  on  their  applicability  to  DCR 
(far  more  extensive  descriptions  of  contemporary  MDO  ap¬ 
proaches  and  methods  can  be  found,  for  instance,  in  (Agte 
et  al.,  2009;  Shan  &  Wang,  2009;  Honda,  Ciucci,  Lewis,  & 
Yang,  2010)).  First,  however,  it  would  be  helpful  to  note  some 
key  differences  between  MDO  and  DCR  problems: 

•  Robust  validation  and  verification  of  a  candidate  point 
design  using  independent  methods  may  not  be  possible 
for  PDM/DCR,  unlike  in  MDO; 

•  Related  to  the  preceding  point,  the  risk  associated  with 
each  potential  DCR  solution  needs  to  be  quantified; 

•  Achieving  real-time  performance  will,  generally,  be  of 
far  greater  importance  to  PDM/DCR  than  to  MDO. 

One  of  the  ways  to  classify  modern  MDO  algorithms  is  into 
these  two  broad  categories:  All-At-Once  (AAO)  and  decom¬ 
position.  All-at-Once  algorithms,  also  referred  to  as  All-In- 
One  (AIO)  or  single-level,  aim  to  achieve  design  decisions 
through  a  single  global  optimization  process  (Cramer,  Den¬ 
nis,  Frank,  Shubin,  &  Lewis,  1993;  N.  Brown,  2004).  While 
such  formulations  have  some  attractive  qualities  (for  instance, 
each  iteration  produces  a  discipline-feasible  solution  and  sen¬ 
sitivity  analysis  on  design  variables  is  usually  easy  to  per¬ 
form),  they  also  have  significant  downsides.  A  designer  us¬ 
ing  AAO  methods  is  likely  to  run  into  scalability  issues  when 
applying  them  to  large,  complex  systems.  Also,  by  aggregat¬ 
ing  knowledge  from  the  subsystems  into  a  single  optimizer, 
some  of  the  discipline-specific  knowledge  may  be  lost.  Fi¬ 
nally,  AAO  approaches  tend  to  limit  the  use  of  well-proven 
analysis  and  optimization  techniques  at  the  discipline  level. 

Decomposition  methods  break  down  a  design  optimization 
problem  into  multiple  subproblems,  usually  along  the  bound¬ 
aries  of  disciplines,  subsystems,  or  individual  components 


(Cramer  et  al.,  1993).  Some  of  the  better  known  methods  are 
bi-level,  such  as  Collaborative  Optimization  (CO),  Con¬ 
current  Subspace  Optimization  (CSSO),  or  Bi-Level  In¬ 
tegrated  System  Synthesis  (BLISS),  and  multi-level,  such 
as  Analytical  Target  Cascading  (ATC). 

CO  (Braun,  Gage,  Kroo,  &  Sobiesky,  1996;  Roth  &  Kroo, 
2008;  Roth,  2008)  uses  target  values  of  the  design  and  state 
variables,  specified  at  the  system  level,  to  guide  individ¬ 
ual  discipline  optimizations.  Communication  between  disci¬ 
plines  in  most  CO  implementations  is  limited,  which  simpli¬ 
fies  implementation,  but  can  also  result  in  slow  convergence. 

The  CSSO  method  (J.  E.  Renaud  &  Gabriele,  1993; 
Sobieszczanski-Sobieski,  Agte,  &  Sandusky,  1998;  Sellar, 
Batill,  &  Renaud,  1996;  G.  Renaud  &  Shi,  2002)  performs 
discipline-specific  optimization  using  local  objective  func¬ 
tions,  variables,  and  constraints,  while  approximating  effects 
on  system  performance  using  Global  Sensitivity  Equations, 
Response  Surfaces,  or  other  types  of  system  models.  Simi¬ 
larly,  system-level  models  of  disciplines  are  used  in  order  to 
approximate  their  behavior.  As  performance  information  is 
accumulated  throughout  the  process,  the  models  can  be  up¬ 
dated  correspondingly. 

In  BLISS  (Sobieszczanski-Sobieski  et  al.,  1998; 
Sobieszczanski-Sobieski,  Emiley,  Agte,  &  Sandusky, 
2000)  each  iteration  of  the  procedure  improves  the  design 
both  on  the  local  (discipline)  and  system  levels.  First, 
a  concurrent  local  optimization  is  performed  using  the 
discipline  design  variables  and  keeping  the  system-level 
variables  constant.  Then,  a  system-level  optimization  on 
shared  variables  is  done.  Total  derivatives  are  communicated 
among  the  disciplines  to  help  predict  the  effects  of  local 
design  choices  on  the  other  disciplines. 

Analytical  Target  Cascading  (ATC)  (Kim,  2001;  Kim, 
Michelena,  Papalambros,  &  Jiang,  2003;  Allison,  Kokko- 
laras,  Zawislak,  &  Papalambros,  2005),  is  primarily  intended 
for  problem  decomposition  by  subsystems  and  components, 
rather  than  disciplines.  ATC  approach  is  flexible  and  multi¬ 
level,  allowing  complex  system  architectures  to  be  repre¬ 
sented.  Other  formal  MDO  methods  can  potentially  be  in¬ 
tegrated  within  an  ATC  framework  (Agte  et  al.,  2009). 

Methods  founded  on  the  principle  of  Lagrangian  Dual¬ 
ity  (LD)  may  also  be  of  interest  for  certain  elements  of 
PDM/DCR.  Classical  LD  methods  are  generally  applied 
to  convex  problems  and  accommodate  decomposition  into 
smaller  sub-problems.  In  order  to  handle  non-convex  prob¬ 
lems,  Augmented  Lagrangian  Duality  (ALD)  theory  has  been 
developed  (Hestenes,  1969).  ALD  algorithms,  however,  lose 
the  decomposition  capability.  In  recent  years,  several  re¬ 
search  efforts  combined  LD  and  ALD  approaches  to  attain 
both  the  ’convexification’  properties  of  ALD  and  the  de¬ 
composition  properties  of  traditional  LD  (Blouin,  Lassiter, 
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Wiecek,  &  Fadel,  2005;  Tosserams,  Etman,  Papalambros,  & 
Rooda,  2005). 

Finally,  MDO  methods  that  have  evolved  from  the  field  of 
Game  Theory  offer  some  promising  alternatives  for  design 
decomposition  architectures.  The  idea  of  using  game  for¬ 
mulations  in  design  problems  goes  back  to  the  work  of  Vin¬ 
cent  (Vincent,  1983)  and  Rao  and  Freiheit  (Rao  &  Freiheit, 
1991).  Some  of  the  further  developments  are  described  in 
(Lewis  &  Mistree,  1997),  (Marston,  2000),  and  (Clarich  & 
Pediroda,  2004).  Games  of  different  forms  have  been  studied 
for  use  in  MDO  applications,  at  least  to  some  extent:  coop¬ 
erative  (Pareto),  approximately  cooperative,  non-cooperative 
(Nash),  coalition,  and  leader/follower.  While  intuitively  a  co¬ 
operative  (Pareto)  form  game  would  appear  to  be  the  natu¬ 
ral  choice  when  setting  up  an  MDO  or  a  PDM/DCR  prob¬ 
lem,  the  other  forms  have  their  place  as  well.  For  instance, 
the  leader/follower  (also  known  as  Stackelberg  or  extensive) 
form  can  be  used  to  set  up  a  sequential  analysis  problem. 
The  non-cooperative  (Nash)  form  could  be  used  in  situations 
when  the  established  communication  protocols  between  sub¬ 
systems  prove  to  be  insufficient  for  a  particular  situation  or 
are  affected  by  a  system  fault.  The  coalition  form  can  be  used 
in  organizing  system  analysis  by  discipline. 

For  the  first  DCR  prototype  we  chose  to  implement  a  cooper¬ 
ative  game-theoretic  protocol  (described  in  the  next  section), 
with  alternative  formulations  to  be  implemented  and  com¬ 
pared  in  future  work.  Similarly  to  BLISS,  the  implemented 
algorithm  passes  the  derivatives  of  local  objective  functions 
with  respect  to  shared  variables.  This  is  done  in  order  to  in¬ 
form  subsystems  of  the  effects  their  choices  may  have  on  the 
other  subsystems. 

9.  DCR  ALGORITHM  DEVELOPMENT 

In  the  prototyped  game-theoretic  DCR  algorithm  the  play¬ 
ers  (subsystems)  cooperate  in  exploring  the,  potentially,  very 
large  option  space  by  taking  turns  in  conducting  the  search 
and,  when  necessary,  relaxing  some  of  their  constraints.  The 
current  formulation  of  the  algorithm  tests  the  concept  for  two 
subsystems,  with  extension  to  larger  numbers  of  subsystems 
planned  for  subsequent  work.  One  constraint  per  subsystem 
is  currently  chosen  as  the  target  for  redesign  (ci  and  C2). 

The  process  (illustrated  on  Figure  4)  starts  with  one  player 
randomly  picked  to  go  first  (let  us  assume  that  it  is  Subsystem 
1).  Subsystem  1  conducts  an  iteration  of  the  search,  finding 
its  best  guess  at  the  optimal  policy  tt*.  The  policy  needs  to 
satisfy  constraints  in  both  C  and  F.  Also,  a  maximum  of  N 
utility  function  calls  is  allowed  per  iteration.  If  no  acceptable 
policy  is  found,  the  target  constraint  c\  is  adjusted  (becoming, 
for  instance,  62  —  Tmax  >  0).  Another  search  iteration  is  per¬ 
formed  and  suitability  of  solutions  is  evaluated.  The  process 
repeats  until  a  maximum  number  of  search  attempts,  Nmax, 
is  reached  or  a  non-empty  set  II*  is  found.  1  Ij,  empty  or  oth¬ 


erwise,  is  then  sent  over  to  Subsystem  2,  along  with  the  neces¬ 
sary  gradient  information  on  objective  function  performance 
(in  a  non-cooperative  formulation  only  II j',  also  known  as  the 
Best  Reply  Correspondence  or  BRC,  would  be  transmitted). 
Note  that  gradient  estimates  are  shared  not  only  for  policies  in 
II  j,  but  also  for  other  policies  considered  during  the  search. 
If  there  is  at  least  one  policy  tt*  £  1 1  j  that  is  also  suitable 
from  the  point  of  view  of  Subsystem  2,  then  the  process  is 
stopped.  Otherwise  Subsystem  2  conducts  its  own  search  it¬ 
eration,  adjusting  C2  as  needed,  and  hands  over  control  of  the 
search  to  Subsystem  1  after  either  Nmax  search  iterations  are 
completed  or  a  non-empty  III  is  found.  II|  and  the  objective 
function  gradients  are  then  transmitted  back  to  Subsystem  1 . 
The  process  continues  until  a  7r*  satisfying  both  subsystems 
is  found. 

It  is  important  to  take  a  look  at  how  objective  functions  for 
each  of  the  players  are  designed.  In  non-cooperative  game 
formulations  (and  some  of  the  traditional  MDO  approaches) 
discipline/subsystem  objective  functions  primarily  focus  on 
the  needs  of  that  particular  discipline  or  subsystem.  In  order 
to  help  expedite  convergence,  in  this  cooperative  formulation 
composite  objective  functions  that  take  into  account  the  effect 
a  candidate  solution  may  have  on  global  objectives  and  on  the 
other  players  are  used.  The  functions  take  on  the  following 
form: 

/i  (tt)  =  Wl,l/g(7r)  +Wll2\Vh,l\ir+Wlt3f1>l(tr), 

/2(  tt)  =  W2,lfg(tt)  +W2,2\Vfl,l\ir  +W2, 3/2, 

where  fg  is  the  global  objective  function  (currently  a  single 
one),  fiti  is  the  objective  function  local  to  the  subsystem,  i 
is  the  subsystem  number,  and  wi  :]  are  the  weights  used  to 
specify  the  degree  of  influence  of  each  of  the  components  of 

fi- 

Another  important  feature  of  the  algorithm  is  that  with  each 
iteration  the  size  of  the  constraint-adjusting  step  is  increased, 
thus  encouraging  the  players  to  come  up  with  a  solution  suit¬ 
able  from  the  other  subsystems’  (and  global)  points  of  view 
as  quickly  as  possible. 


=  M,7T2  >-,<} 

{(v/i.,)^,  (VA.,),-, ....  (V/!,*),-} 


{(V/2,i)»-,  (V/a,,)»-, (V/2l,)wr } 

n2  =  M,  *2  >•••><»} 

Figure  4.  Two-subsystem  cooperative  game  formulation 


A  variant  of  Simulated  Annealing  (Bertsimas  &  Tsitsiklis, 
1993),  or  SA,  is  currently  utilized  for  searches  of  the  option 
space  by  the  subsystems.  In  this  particular  case  SA  was  cho¬ 
sen  to  take  advantage  of  the  gradient  information  exchanged 
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by  the  subsystems,  while  also  avoiding  getting  ’stuck’  at  the 
local  minima.  The  algorithm  accomplishes  the  latter  by  per¬ 
forming  randomized  jumps  to  other  promising  locations  of 
the  search  space.  The  probability  of  continuing  with  the  local 
search  vs.  performing  a  jump  is  influenced  by  an  annealing 
schedule  T(t ),  and  is 


p[x(t  +  1)  =  Xj\x(t)  =  Xi]  = 


Wij  exp 


m 


max{0,  f(xj)  -  f{xi)} 


where 


X  A  finite  search  space  (again,  not  to  be  mistaken 
for  the  POMDP  state  vector). 

/  A  real-valued  objective  function  /  defined  on  X . 

X*  C  X  is  the  set  of  the  global  minima  of  /. 

Xi  The  neighbor  set  of  Xi ,  Xi  C  {X  —  Xi},  Xi  G  X. 

A  probabilistic  weight  for  transition  from  Xi  to 

Xj,  Xj  G  Xi,  s.t.  wij  =  with 

Xj  G  Xi  -<==>-  Xi  G  Xj  implied. 

T  The  annealing  schedule.  T  :  N  — »•  (0,  oo)  is  a 

non-increasing  function  and  N  is  a  set  of  positive 
integers,  T(t)  is  the  temperature  at  time  t. 

The  above  assumes  that  Xi  ^  Xj .  Xj  G  Xi.  If  Xi  ^  x7  and 
Xj  Xi,  then p[x(t  +  1)  =  Xj\x(t)  =  Xi]  =  0. 


10.  Test  Platform 

The  testbed  being  used  in  the  current  validation  experiments 
is  the  Kll  planetary  rover  prototype  and  its  associated  soft¬ 
ware  simulator  (Balaban  et  ah,  2011).  Another  testbed  tar¬ 
geted  for  future  experiments  is  the  Edge  540  UAV  located 
at  NASA  Langley  (Hogge,  Quach,  Vazquez,  &  Hill,  2011). 
While  the  algorithmic  infrastructure  is  developed  to  accom¬ 
modate  the  UAV,  that  part  of  the  work  is,  otherwise,  in  its 
early  stages. 


FR  wheel/motor  BR  wheel/motor 


FL  wheel/motor  BL  wheel/motor 

Figure  5.  K1 1  data  flow 

Table  2.  K1 1  data 


measurement 

symbol 

absolute  position  (longi¬ 
tude,  latitude) 

A  A 

wheel  angular  velocity 

LOFL ,  Wffl  ,U>BL,WBR 

attitude  (yaw,  pitch,  roll) 

a,/3,7 

battery  temperature 

Tbi,Tb2,  Tb3,  Tbi 

battery  voltage 

Vbl,vb2,  Vb3,  Vb4 

motor  temperature 

TmFLi  TmFRt  TqnBLt  T mBR 

motor  current 

Ifl, Ifr, Ibl,Ibr 

power  bus  current 

1-bus 

The  software  simulator  reproduces  both  nominal  and  off- 
nominal  behavior  of  the  hardware  testbed.  The  simulator  has 
a  dual  purpose:  (a)  to  aid  in  the  development  of  PDM  al¬ 
gorithms  as  a  virtual  testbed  and  (b)  to  provide  /  estimates 
during  the  decision-making  process. 

10.2.  Fault  Modes 


10.1.  Kll  overview 

The  Kll  is  a  large  four-wheeled  rover  platform  (approxi¬ 
mately  1.4  m  long  by  1.1  m  wide  by  0.63  m  tall,  weighing 
roughly  150  kg).  Each  wheel  is  driven  by  an  independent 
250  W  graphite-brush  motor,  connected  through  a  bearing 
and  gearhead  system,  with  each  motor  controlled  by  a  single¬ 
axis  digital  motion  controller.  Four  14.8  V  3.3  Ah  lithium- 
ion  batteries,  connected  in  series,  power  the  vehicle.  The 
on-board  computer  runs  control  and  reasoning  algorithms,  as 
well  as  coordinates  data  acquisition.  Measurements  available 
on-board  are  shown  in  Table  2  and  on  Figure  5. 

In  the  table  F,  B,  L,  R  refer  to  front,  back,  left,  and  right, 
respectively.  Altitude  h  is  determined  using  A,  <j>  and  a  terrain 
map  M. 


Table  3  describes  the  K1 1  fault  modes,  implemented  either 
in  hardware,  simulation,  or  both.  Some  of  the  fault  modes, 
such  as  sensor  faults,  are  injected  primarily  for  testing  diag¬ 
nostic  functionality  (i.e.  such  faults  have  brief  fault-to-failure 
times),  while  the  others  exhibit  a  more  continuous  fault  pro¬ 
gression  behavior  and  are  used  for  validation  of  prognostic 
algorithms. 

10.3.  Diagnostic  Functionality 

Two  diagnostic  algorithms  are  currently  in  use  with  the  Kll 
testbed.  The  first  one,  QED  (Qualitative  Event-based  Di¬ 
agnosis),  is  described  in  (Daigle  &  Roychoudhury,  2010). 
It  utilizes  a  qualitative  diagnosis  methodology  that  isolates 
faults  based  on  the  transients  they  cause  in  the  system  behav¬ 
ior,  manifesting  as  deviations  in  residual  values  (Daigle  & 
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Table  3.  K1 1  fault  modes. 


fault  model 

subsystem 

battery  capacity  degradation 

Power 

parasitic  electric  load 

Power 

motor  failure 

Propulsion 

increased  motor  friction 

Propulsion 

sensor  bias/drift/failure 

Sensors 

Roychoudhury,  2010).  The  second,  Hybrid  Diagnosis  Engine 
(HyDE)  is  a  diagnosis  algorithm  that  uses  candidate  genera¬ 
tion  and  consistency  checking  to  diagnose  discrete  faults  in 
stochastic  hybrid  systems  (Narasimhan  &  Brownston,  2007). 
’Hybrid’  in  this  case  refers  to  combined  discrete  and  contin¬ 
uous  models  used  by  the  algorithm  to  analyze  input  data  and 
deduce  the  transitions  in  system  state  over  time,  including 
changes  indicative  of  faults. 

10.4.  Prognostic  Functionality 

Once  a  fault  is  detected  and  diagnosed,  a  prognostic  algo¬ 
rithm  appropriate  to  the  type  of  the  fault  is  invoked.  For 
battery  capacity  deterioration,  as  well  as  for  charge  estima¬ 
tion,  an  algorithm  based  on  the  Particle  Filter  framework  is 
planned  to  be  used  (Saha  &  Goebel,  2009)  and  (Saha  et  al., 
2011).  Prognostic  estimation  of  temperature  build-up  inside 
the  electric  motors  -  which  can  lead  to  winding  insulation  de¬ 
terioration  and  eventual  failure  -  will  be  done  using  a  Gaus¬ 
sian  Process  Regression  algorithm  (Balaban  et  al.,  201 1).  Fi¬ 
nally,  work  is  in  progress  to  implement  prognostics  for  elec¬ 
tronic  components  of  the  motor  drive  units  (such  as  capaci¬ 
tors  and  power  transistors)  using  Kalman  Filter  and  Extended 
Kalman  Filter  approaches  (Celaya,  Saxena,  &  Saha,  201 1). 

11.  Validation  Experiments 

The  following  section  describes  the  scenarios  used  for  vali¬ 
dating  the  policy  optimization  algorithm,  PPG,  and  the  con¬ 
straint  redesign  algorithm,  DCR.  Subsections  11.1  (Policy 
optimization)  and  11.2  (Dynamic  Constraint  Redesign)  are 
structured  in  a  similar  manner:  formal  scenario  formulations 
are  provided  first,  followed  by  descriptions  of  how  the  exper¬ 
iments  were  conducted,  with  the  experimental  results  sum¬ 
marized  last.  Both  of  the  algorithms  have  only  been  tested  in 
simulation  at  this  time. 

11.1.  Policy  optimization 

Policy  optimization  experiments  were  developed  around  a 
scenario  (Scenario  Rl,  with  ’R’  denoting  rover  scenarios) 
where,  for  science  operations,  an  unmanned  planetary  rover 
is  tasked  with  visiting  a  certain  number  of  locations.  Each 
location  has  a  scientific  payoff  (reward)  value  associated  with 
it.  The  terrain  is  of  variable  elevation  and  the  surface  fric¬ 
tion  coefficient  is  considered  to  be  constant.  The  rover  has 


a  finite  amount  of  energy  available  to  complete  the  mission. 
At  some  point  during  the  mission  a  system  fault  is  detected 
(e.g.,  a  deteriorating  electrical  connector)  that  limits  the  over¬ 
all  remaining  useful  life  of  the  vehicle.  We  also  assume  that 
the  degradation  rate  depends  on  the  operating  conditions  (e.g. 
the  amount  of  heat  generated  in  the  instrumentation  compart¬ 
ment  during  the  drive).  Either  depletion  of  energy  or  com¬ 
plete  component  failure  signify  EOL.  The  goal  of  the  PDM 
system  is  to  reassess  the  original  mission  plan  and  find  a 
suitable  (ideally,  optimal)  compromise  between  extending  the 
life  of  the  vehicle  and  achieving  the  maximum  science  payoff 
as  possible. 

11.1.1.  Scenario  formulation 

Given: 


Ce,  Crj 

Inequality  constraints 
on  available  energy  and 
health 

/U)  =  {/r(7r),/r,(7r),/e(7r)} 

Objective  functions 

for  cumulative  reward, 
health  degradation,  and 
energy  consumption 

v  =  {vr,vh,ve},  ( vr,vh,ve  £ 

Optimization  prefer¬ 

[0,1]) 

ences  vector 

N  =  {ni,n2,  ...,n|jv|} 

Nodes  (locations)  to  be 
visited 

a  =  { rii ,  rij},  i  £ 

An  action  constitutes  a 

[1,2, -.\N\  -  i],j  e 

move  between  a  pair  of 

[2,  ...|iV|  -  1] 

nodes  (start  and  finish) 

ai  =  {ni,rii},i  ^  1 

The  first  action  of  a  de¬ 
cision  is  a  special  case 
(go  from  the  current  lo¬ 
cation,  labeled  n\,  to  an¬ 
other  node 

Um  =  {rtj ,  t  — 

Any  action  after  the  first 

{m,nj}),  ( i,j,k  £ 

one  needs  to  start  on  the 

[1, 2,  ...|iV  ]),  (m  £ 

node  where  the  previous 

[2,3,...|1V|]) 

one  finished 

Find: 

n* 

Pareto  set  of  policies 

11.1.2.  Design  of  experiments 

A  synthetic  terrain  map  A4  was  generated  (Figure  6)  and  ten 
wayponts  (nodes)  still  to  be  visited  by  the  rover  were  selected 
on  it.  Each  node  is  associated  with  a  reward  value  (shown 
in  parenthesis).  The  bar  on  the  right  side  of  the  map  and  the 
isolines  depict  the  elevation  changes. 

Test  scenarios  with  increasing  numbers  of  remaining  nodes 
(6-10)  were  then  created.  The  nodes  were  selected  in  such 
a  way  so  as  to  make  it  impossible  for  the  vehicle  to  visit 
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Figure  6.  Terrain  map  with  scientific  target  locations  (eleva¬ 
tions  and  distances  are  in  meters) 

all  of  them  before  either  energy  depletion  or  vehicle  health 
deterioration  resulted  in  EOL.  PPG  was  allocated  a  limited 
number  of  utility  function  calls  (UFC)  to  test  performance  in 
resource-constrained  conditions.  An  exhaustive  search  algo¬ 
rithm  (ES),  used  for  verifying  PPG  results  and  benchmarking, 
was  not  limited  in  how  many  times  it  could  invoke  the  utility 
function.  The  metric  used  for  evaluating  performance  was  the 
cumulative  reward  for  the  best  path  (policy)  found  by  each  al¬ 
gorithm.  Each  scenario  was  executed  30  times  and  the  mean 
and  standard  deviations  were  computed.  All  of  the  code  was 
written  in  MATLAB  (R2010b)  and  executed  on  an  Intel  Core 
i7  Duo  2.8GHz  computer. 

11.1.3.  Experimental  results 


Figure  7.  Mean  execution  times  comparison 


Table  4  summarizes  the  cumulative  reward  results  obtained 
by  ES  and  PPG.  The  number  of  utility  function  calls  used  by 
ES  is  provided  for  comparison.  While  not  quite  achieving 
scores  as  high  as  ES  for  the  larger  size  problems,  PPG  still 
does  relatively  well,  particularly  given  that  in  those  scenarios 
it  uses  a  small  fraction  of  UFC  used  by  ES  (PPG  performance 
improves,  as  expected,  if  more  UFC  are  permitted). 

Execution  time  for  each  of  the  algorithms  was  also  recorded 
for  all  of  the  scenarios,  with  the  data  summarized  in  Table 
5.  It  can  be  observed  that  execution  times  for  ES  start  grow¬ 
ing  exponentially  with  problem  size  and  using  this  approach 
becomes  impractical  for  problems  containing  more  than  10 
nodes.  While  not  having  the  ability  to  validate  the  cumulative 
reward  performance  on  problems  larger  than  that  (in  reason¬ 
able  time),  we  still  tested  PPG  with  scenarios  containing  15, 
20,  and  25  nodes.  The  average  execution  times  are  presented 
on  Figure  7  and  lead  us  to  believe  that  the  approach  adopted 
for  PPG  remains  practical  for  real-time  applications  even  for 
policies  with  large  numbers  of  actions  (at  least  up  to  25).  The 
question  of  how  to  evaluate  the  quality  of  generated  policies 
in  large-size  problems  is  something  we  hope  to  investigate  in 
subsequent  work. 

11.2.  Dynamic  Constraint  Redesign 

To  test  the  DCR  algorithm,  a  scenario  was  used  (Scenario 
R2)  where  one  of  the  rover  motors  (FL)  has  experienced  an 
Increased  Motor  Friction  fault.  This  results  in  increased  cur¬ 
rent  consumption  by  the  motor  and,  consequently,  a  higher 
rate  of  heat  build-up  both  in  it  and  the  batteries  supplying  the 
current.  For  the  purposes  of  this  scenario  the  batteries  are 
viewed  as  a  single  unit,  with  its  temperature  denoted  by  Xf,. 
Temperature  of  the  affected  motor  is  denoted  as  Tm.  Even 
given  the  fault,  the  rover  is  still  required  to  travel  a  certain 
distance  in  a  given  amount  of  time  in  order  to  reach  a  point  fa¬ 
vorable  for  battery  recharging  and  communication  with  con¬ 
trollers.  To  accomplish  that,  the  rover  needs  to  alternate  pe¬ 
riods  of  driving  with  periods  of  stationarity,  in  order  to  not 
exceed  the  maximum  temperature  limits  for  both  the  battery 
and  the  motor.  The  two  components  belong  to  Power  (Po) 
and  Propulsion  (Pr)  subsystems,  respectively.  As  the  com¬ 
ponents  heat  up  and  cool  at  different  rates,  a  suitable  sched¬ 
ule  for  driving  and  cooling  down  periods  (policy)  needs  to  be 
negotiated  between  the  subsystems.  As  no  acceptable  poli¬ 
cies  may  exist  that  satisfy  both  the  minimum  distance  and  the 
maximum  time  constraints,  the  two  subsystems  may  need  to 
negotiate  increases  in  their  operating  temperature  limits.  It  is 
in  the  interest  of  each  subsystem  to  keep  its  limit  as  low  as 
possible,  in  order  to  reduce  the  risk  of  failure.  The  rover,  as 
a  whole,  is  also  interested  in  keeping  the  risk  of  component 
failure  as  low  as  possible,  while  still  achieving  the  destination 
in  the  time  alloted. 
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Table  4.  Maximum  cumulative  reward  values  obtained  by  ES  and  PPG  algorithms  (in  points) 


nodes  ES  UFC  ES  result  500  UFC  PPG  mean  (a)  5000  UPC  PPG  mean  (a)  10000  UFC  PPG  mean  (a) 


6  720 

237 

235.60  (05.33) 

237.00  (00.00) 

237.00  (00.00) 

7  5040 

311 

295.80(11.29) 

305.77  (09.11) 

305.77  (09.11) 

8  40320 

343 

329.93  (08.51) 

342.57  (02.37) 

340.83  (04.93) 

9  362880 

373 

326.27  (11.31) 

345.47  (16.50) 

348.87  (16.76) 

10  3628800 

403 

347.47  (22.93) 

382.73  (18.04) 

388.97  (16.47) 

Table  5.  ES  and  PPG  execution  time  (in  seconds) 

nodes 

ES  UFC 

ES  mean  (a) 

500  UFC  PPG  mean  (a) 

5000  UFC  PPG  mean  (a) 

10000  UFC  PPG  mean  (cr) 

6 

720 

0.0192  (0.0003) 

0.1756  (0.0049) 

1.5961  (0.0183) 

3.3543  (0.2253) 

7 

5040 

0.1154  (0.0020) 

0.2079  (0.0083) 

2.2419(0.1778) 

4.5870  (0.4473) 

8 

40320 

0.8385  (0.0105) 

0.2212(0.0114) 

3.5883  (0.2132) 

9.0177  (0.8424) 

9 

362880 

8.0367  (0.0448) 

0.2350  (0.0072) 

4.1321  (0.1315) 

12.7910(0.3515) 

10 

3628800  310.9904(3.9258) 

0.2412  (0.0060) 

4.4041  (0.2654) 

14.4285  (0.7322) 

11.2.1.  Scenario  formulation 

Given: 


vc  =  0.3m/ s 

the  minimum  velocity  the  rover  can 
maintain  without  stalling,  given  the 
fault.  Also  assumed  to  be  best 
(cruise)  velocity  in  terms  of  energy 
efficiency 

Tb,init  =  40°G 

the  initial  operating  temperature  of 
the  battery 

=  35  °C 

the  initial  operating  temperature  of 
the  motor 

Tb,max0  =  60  °C 

the  initial  operating  temperature 
limit  for  the  battery 

t  —  «n  °r 

m,maxo  —  '-'U  ^ 

the  initial  operating  temperature 
limit  for  the  motor 

Ta  =  30°  C 

the  ambient  temperature  (constant) 

IS  =  5A 

peak  current  drawn  by  the  affected 
motor  in  order  to  reach  vc  from  full 
stop  (start  current) 

Ic  =  2  A 

current  drawn  by  the  affected  motor 
at  vc  (cruise  current) 

drain  =  500m 

the  minimum  traverse  distance 

tmax  =  3600s 

the  maximum  time  to  reach  the  des¬ 
tination 

ts  =  2s 

the  time  needed  to  achieve  cruise  ve¬ 
locity  from  a  complete  stop 

A  notional  current  profile  for  the  damaged  motor  is  shown  on 
Figure  8.  For  simplicity,  current  draw  by  the  three  healthy 
motors  was  assumed  to  be  constant  throughout  the  motion  at 
1A  each.  It  is  also  assumed  that  prognostic  information  on 
battery  and  motor  EOL  is  provided. 


Find: 

td  drive  period  duration 

tc  cooldown  period  duration 

Tb,max f  final  operating  temperature  limit  for  the 

battery,  C  [Tbtrnaxo  ?  cxj) 

Tm,maxf  final  operating  temperature  limit  for  the 

motor,  Tm ,  rn.  ax  f  e  [Tm  ,max o  5  oo) 

In  this  formulation  tdltc,Tb^maXf,Trn^maXf  are  the  decision 
variables. 

11.2.2.  Design  of  experiments 

Each  subsystem  was  given  a  maximum  of  M  =  3  search  it¬ 
erations  before  it  had  to  relinquish  control  of  the  process.  td 
and  tc  could  be  picked  from  intervals  between  10  and  100s,  in 
10s  increments.  A  simplified  version  of  the  simulator  (track¬ 
ing  only  the  distance  traveled  and  the  temperature  state  of  the 
affected  motor  and  the  battery)  was  used  as  the  utility  func¬ 
tion,  in  order  to  speed  up  execution.  The  following  general 
thermal  state  equation  was  used  in  the  simulator: 

dT  =  T-( RI 2  +  h{Ta  -  T))dt, 

where  T  is  the  component  temperature,  Ct  is  the  thermal  ca¬ 
pacity  coefficient,  R  is  the  electrical  resistance,  I  is  the  cur¬ 
rent,  h  is  the  heat  transfer  coefficient,  and  ta  is  the  ambient 
temperature.  Model  parameters  used  in  the  experiments  are 
provided  in  Table  7. 


Table  7.  Model  parameters 


parameter 

motor 

battery 

units 

ct 

11 

25 

7 

K 

R 

0.5 

1.0 

Ohm 

h 

0.03 

0.08 

w 

K 

15 


Annual  Conference  of  the  Prognostics  and  Health  Management  Society  2012 


Figure  8.  Current  profile  for  the  damaged  motor 
Table  6.  DCR  iterations  in  the  example  run 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

active  subsystem 

Po 

Po 

Po 

Pr 

Pr 

Pr 

Po 

Po 

Po 

Pr 

Pr 

Pr 

Po 

Pr 

Pr 

td(s ) 

50 

50 

60 

30 

30 

40 

70 

50 

80 

40 

50 

50 

80 

70 

90 

tc{s) 

80 

80 

90 

90 

90 

100 

100 

60 

80 

90 

100 

80 

80 

90 

100 

Prognostic  information  was  supplied  in  a  differential  form  as 
the  probability  of  reaching  EOL: 

dpEOL  =  j§ro T3dt, 

where  a  =  1 .5  Al.:,  for  the  battery  and  a  =  1 .3  for  the 
motor. 

The  system  probability  of  EOL  was  calculated  as  a  weighted 
sum  of  the  two  component  EOL  probabilities:  psystem  = 
0.8pf,  -f  0.2 pm.  In  this  case  the  battery  failure  was  consid¬ 
ered  to  be  a  greater  risk  than  a  motor  failure,  as  in  the  latter 
case  the  possibility  of  achieving  the  objective  remained  by  us¬ 
ing  the  remaining  three  motors.  Minimization  of  risk  of  pre¬ 
mature  failure  was  included  in  both  the  local  and  the  global 
components  of  the  subsystem  objective  functions. 

11.2.3.  Experimental  results 

The  output  from  one  of  the  runs  of  the  algorithm  is  presented 
on  Figure  9  and  in  Table  6.  The  top  subplot  of  Figure  9  shows 
the  evolution  of  temperature  constraints  for  the  two  subsys¬ 
tems  throughout  the  negotiation  process.  The  middle  subplot 
shows  the  maximum  distances  achievable  from  each  of  the 
subsystems’  point  of  view.  The  process  ends  when  both  of  the 
subsystems  are  predicted  to  be  capable  of  achieving  dm,;„ ,  al¬ 
beit  with  a  higher  risk  of  failure  while  doing  so.  The  bottom 
subplot  shows  the  estimated  risk  of  system  failure  for  each 
iteration  of  the  algorithm.  Table  6  shows  which  subsystem 
had  the  control  of  the  process  during  each  iteration  and  the 
{ td,  tc)  pair  it  proposed  as  the  best  solution.  In  the  exam¬ 
ple  run  given  here,  the  final  temperature  limit  for  the  battery 
was  found  to  be  at  approximately  65. 3°  C  and  the  one  for  the 
motor  at  approximately  75. 5°  C. 

12.  Summary  and  Future  Work 

In  this  paper  we  outlined  our  approach  to  development  of 
prognostic  decision  making  methods  for  aerospace  applica¬ 


tions.  First,  definitions  for  prognostic  decision  making  and 
related  concepts  were  suggested,  then  a  few  motivating  ex¬ 
amples  (highlighting  potential  use  cases  for  PDM)  were  de¬ 
scribed.  The  examples  also  helped  to  illustrate  the  general  at¬ 
tributes  of  the  problem  type  we  hope  to  address:  (1)  complex, 
multi-component  systems;  (2)  dynamic  operating  environ¬ 
ments;  (3)  degradation/fault  modes  that  evolve  in  their  char¬ 
acteristics  over  time  and  have  the  potential  of  substantially  af¬ 
fecting  system  performance;  and  (4)  decisions  on  mitigation 
measures  required  in  a  finite  amount  of  time.  From  there  we 
derived  our  set  of  high-level  requirements  for  aerospace  PDM 
systems.  With  these  requirements  in  mind,  we  reviewed  re¬ 
lated  prior  efforts  from  the  areas  of  prognostics-enabled  con¬ 
trol,  post-prognostic  decision  support,  condition-based  main¬ 
tenance,  and  automated  contingency  management.  We  then 
explained  our  process  for  selecting  suitable  policy  genera¬ 
tion  techniques  and  presented  a  prototype  algorithm  that  uses 
probabilistic  methods  and  prognostic  information  in  gener¬ 
ation  of  action  policies.  The  algorithm,  PPG,  was  tested 
against  an  exhaustive  search  algorithm  on  scenarios  involving 
a  planetary  rover  prototype.  We  also  considered  the  problem 
where  no  feasible  policies  are  found  or  where  feasible  poli¬ 
cies  in  the  generated  Pareto  set  are  not  sufficient  for  attaining 
performance  objectives,  given  the  current  system  constraints. 
We  proposed  that  this  problem  has  certain  common  character¬ 
istics  with  problems  from  the  field  of  Multidisciplinary  De¬ 
sign  Optimization  and  reviewed  some  of  the  modern  MDO 
approaches  for  applicability.  One  of  the  approaches  is  based 
on  game-theoretic  principles  and  served  as  a  foundation  for 
the  second  algorithm  presented,  DCR.  This  algorithm  sets  up 
a  negotiating  framework  for  subsystems  to  adjust  their  operat¬ 
ing  constraints,  if  that  becomes  necessary  for  achievement  of 
high-importance  system  objectives.  DCR  was  demonstrated 
on  a  problem  involving  two  subsystems,  power  and  propul¬ 
sion. 
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Figure  9.  DCR  output  example 


While  it  is  not  possible  to  cover  all  of  the  topics  discussed 
in  sufficient  detail  in  one  paper,  we  hope  that  it  provides  a 
good  foundation  for  future  efforts.  The  work  done  so  far  also 
gave  us  a  better  appreciation  for  the  challenges  ahead.  One  of 
them  is  developing  more  efficient  multi-objective  optimiza¬ 
tion  algorithms  -  given  the  high  computational  cost  of  a  utility 
function  (simulation)  call  in  a  typical  application.  We  plan  to 
continue  our  development  of  probabilistic  optimization  meth¬ 
ods  and  further  investigate  applicability  of  evolutionary  algo¬ 
rithms.  Use  of  multi-fidelity  models  and  response  surfaces 
for  utility  simulation  will  be  researched  as  well. 

For  the  problem  of  DCR,  we  plan  to  concentrate  on  the 
following  three  goals:  (1)  extend  the  current,  cooperative 
game  DCR  algorithm  to  greater  possible  numbers  of  play¬ 
ers/subsystems;  (2)  investigate  other  formulations,  possibly 
based  on  ideas  in  CO,  CSSO,  and  BLISS;  (3)  develop  meth¬ 
ods  for  selection  of  those  constraints  that  offer  the  most  sys¬ 
tem  benefit  if  revised  (approaches  based  on  Lagrangian  Du¬ 


ality  appear  promising  for  this  purpose).  We  also  hope  that 
decomposition  formulations  researched  for  DCR  will  also 
prove  helpful  for  the  prognostic  policy  generation  work.  Fi¬ 
nally,  identifying  and,  if  necessary,  developing  suitable  per¬ 
formance  metrics  will  become  more  important  as  complexity 
of  test  scenarios  and  algorithms  increases. 
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